feat(strands-py): add GoalLoop vended plugin with docs by notowen333 · Pull Request #2738 · strands-agents/harness-sdk

notowen333 · 2026-06-11T18:08:40Z

Description

Agents often need iterative refinement — retry until the response meets a quality bar. Today that means hand-rolling a loop with timeout logic, attempt tracking, feedback injection, and state management. GoalLoop encapsulates all of that as a vended plugin that works inside the existing hook lifecycle.

This PR ports the GoalLoop plugin from TypeScript to Python and adds a dual-language documentation page.

Public API Changes

New module: strands.vended_plugins.goal

from strands import Agent
from strands.vended_plugins.goal import GoalLoop

# Natural-language goal — judged by an internal agent built from the host's model
concise = GoalLoop(
    goal="At most 3 sentences, accessible to a 10-year-old, no jargon.",
    max_attempts=3,
)

agent = Agent(plugins=[concise])
agent("Explain how rainbows form.")
print(concise.last_result(agent))
# GoalResult(passed=True, stop_reason='satisfied', attempts=[...])

# Programmatic validator — pass a callable to skip the judge agent entirely
from strands.vended_plugins.goal import GoalLoop

def word_count_validator(response, agent):
    text = " ".join(
        block["text"] for block in response["content"] if "text" in block
    )
    words = len(text.split())
    if words <= 50:
        return True
    return {"passed": False, "feedback": f"Too long ({words} words). Cap at 50."}

plugin = GoalLoop(goal=word_count_validator, max_attempts=5, timeout=30.0)

Exported symbols

Symbol	Kind	Purpose
`GoalLoop`	Plugin class	Main entry point — attach to an agent via `plugins=[...]`
`GoalResult`	Dataclass	Aggregate result with `passed`, `stop_reason`, `attempts`
`GoalAttempt`	Dataclass	Per-attempt record: `attempt`, `passed`, `feedback`
`GoalStopReason`	Literal type	`"satisfied"` \| `"max_attempts"` \| `"timeout"`
`JudgeConfig`	Dataclass	Optional judge tuning: `model`, `system_prompt`
`ValidationOutcome`	Dataclass	Canonical validator return: `passed`, `feedback`
`Validator`	Protocol	Type for programmatic validator callables
`JUDGE_SYSTEM_PROMPT`	str	Default system prompt for the NL judge
`JudgeOutcome`	Pydantic model	Structured output schema the judge fills
`build_judge_prompt`	Function	Builds the judge input from a goal + transcript

GoalLoop constructor parameters

Parameter	Default	Description
`goal`	(required)	NL string (judged by internal agent) or callable validator
`max_attempts`	`inf`	Maximum attempts before stopping
`timeout`	`inf`	Wall-clock budget in seconds
`judge`	`None`	`JudgeConfig` to override the judge model or system prompt
`preserve_context`	`True`	Keep conversation history across retries
`resume_prompt_template`	(built-in)	`Callable[[str
`name`	`"strands:goal-loop"`	Plugin name (must be unique per agent)

Related Issues

N/A - new feature port

Documentation PR

Included in this PR under site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx with dual-language tabs (Python + TypeScript).

Type of Change

New feature

Testing

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

agent-of-mkmeral · 2026-06-12T09:44:52Z

Re-check at head `7d855d0` (covers `930c808` + `7d855d0`)

Verdict: the port is still faithful, and all four items from my original review are resolved. Re-verified the changed surface against the TS source and ran the suite — 41/41 unit tests pass locally.

✅ The one real fidelity gap is fixed (Ralph-mode `system_prompt` rewind)

run.initial_snapshot = event.agent.take_snapshot(
    preset="session", include=["system_prompt"], exclude=["state"]
)

Verified empirically against the snapshot resolver:

before fix: ['conversation_manager_state', 'interrupt_state', 'messages', 'model_state']
after fix:  ['conversation_manager_state', 'interrupt_state', 'messages', 'model_state', 'system_prompt']

Python now rewinds everything the TS session preset rewinds (plus conversation_manager_state, which is documented inline as an intentional Python-only divergence — exactly what I'd hoped for). The test at test_snapshot_taken_on_first_model_call pins the exact call signature, so this can't silently regress. And the docs claim in goal-loop.mdx ("messages, system prompt, model state") is now accurate for Python — no doc change needed.

✅ Minor items from my review, all addressed

Judge-path unit tests — the previously untested _judge_validator now has 6 mocked tests (test_nl_judge_*: first-attempt pass, feedback loop, judge.model override, judge.system_prompt override, no-structured-output fallback, fresh-agent-per-validation), matching the TS suite's coverage 1:1. The patch target (strands.agent.agent.Agent) correctly intercepts the plugin's lazy import.
GoalStopReason — now Literal["satisfied", "max_attempts", "timeout"].
import inspect — moved to module level.
Bonus: judge.py rendering helpers got real types (Message/ContentBlock/ToolResultContent instead of dict) — behavior unchanged, port fidelity unaffected.

Port fidelity of the new changes themselves

The Validator → @runtime_checkable Protocol conversion preserves the exact call shape ((response, agent) positionally, return bool | dict | ValidationOutcome, sync or async), so all TS-equivalent behavior in _fn_validator is untouched. The structured-logging reformat changes only the log string, not semantics.

Remaining (non-blocking)

resume_prompt_template is still a frozen Callable — same forward-compat argument as Validator, worth doing while the API is new (the /strands review bot flagged this too).
CI note: all Python unit-test jobs are green across 3.10–3.14 / linux+windows (the CANCELLED entries are superseded runs); the label-size check failure is labeler noise unrelated to this change.

Good to go from a port-fidelity standpoint. 🚢

Port the GoalLoop iterative-refinement plugin from TypeScript to Python. The plugin validates agent responses against a goal (NL string or programmatic validator) and loops with feedback until satisfied. Includes 35 unit tests and a dual-language documentation page.

TypeScript examples must live in sibling .ts files and be included via --8<-- directives, not inlined in MDX. Created goal-loop.ts and goal-loop_imports.ts with proper snippet regions. Updated docs-writer skill to make this convention unmissable: added CRITICAL callout in Step 3b and a new top-level Gotcha.

All three skills (writer, reviewer, audit) now enforce: - TypeScript is never inlined in MDX - Imports live in a separate _imports.ts file with per-example regions - Every TS example must include both imports and body snippets - A body-only include missing imports fails review

Replace ASCII box-drawing diagram with a mermaid flowchart. Add mermaid requirement to docs-writer and docs-reviewer skills.

Both were plain facts that belong as inline prose, not visually loud admonitions. The caution described behavior the plugin already warns about; the note was just context.

The callout sparing-use rule already lives in mdx-authoring.md. Remove redundant restatements from writer/reviewer skills and instead point to the existing guidance at the right moments.

- Reflow goal-loop.mdx prose to fill lines to ~80-90 chars - Remove language-specific param from heading ("Stateless Retries") - Restructure reviewer skill: split monolithic Constraints bullet into separate dimensions (Voice Stack, Multi-Language, Terminology, Code Examples with site conventions, Readability, Type Alignment) - Add heading language-neutrality rule to writer Step 4

Prose outside tabs must be language-neutral. Replaced Python-specific parameter names (preserve_context=False, max_attempts, stop_reason, last_result()) with plain English equivalents. Updated writer skill to make language-neutral shared prose the top-level rule.

…ator - Decompose build_judge_prompt into small named helpers instead of nested loops - Rename _nl_validator to _judge_validator for clarity - Add integration tests mirroring the TS integ suite (standard loop + preserve_context=False) - Type WeakSet/WeakKeyDictionary with Agent instead of Any

- Reject timeout <= 0 (was allowing 0 which causes immediate timeout) - Use unicode ellipsis in truncation to match TS output format

- Include system_prompt in Ralph-mode snapshot (restores TS parity) - Use Literal type for GoalStopReason instead of bare str - Move `import inspect` to module level - Fix mypy: type judge helpers with Message/ContentBlock/ToolResultContent - Add NL judge unit tests (construction, feedback, model/prompt overrides, fallback path, fresh-agent-per-validation) - Remove unused pytest import from integ tests

…ment - Convert Validator from Callable alias to Protocol with **kwargs for forward-compatible extensibility - Fix logger.warning to use structured format (plugin=<%s>, error=<%s> |) - Use full GoalResult/GoalAttempt equality assertions instead of per-field - Add comment explaining Python snapshot preset divergence from TS

Replace 'plugins' (not in the collection schema) with 'event-loop'.

github-actions · 2026-06-12T21:04:10Z

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Changed pages:

user-guide/concepts/plugins/goal-loop

Updated at: 2026-06-15T15:54:21.146Z

- Remove hook/event implementation details from "How It Works" section - Convert JUDGE_SYSTEM_PROMPT to triple-quoted string for readability - Extract TS word_count_validator into a named function (matches Python) - Normalize variable naming to `plugin` in "Inspecting Results" examples - Replace Spanish resume prompt examples with English - Use <Syntax> component for language-specific inline terms - Add prompt-authoring tag to link goal-loop with steering - Add explanatory comments for WeakKeyDictionary and WeakSet usage

The "start over from scratch" prompt shows a real reason to customize — diverging from the default incremental-fix behavior — rather than restating the default in slightly different words.

github-actions · 2026-06-12T22:17:05Z

Assessment: Comment (one merge-gate fix before merge)

Clean, well-documented port with strong test coverage. I verified the suite locally — 41/41 unit tests pass, mypy is clean, and ruff check (lint) is clean. The prior review history (port fidelity, Validator Protocol, structured logging, full-object assertions, snapshot-preset docs) is all addressed, so I focused only on what's new.

What I found

Formatting gate (worth fixing before merge): ruff format --check fails on plugin.py — see the inline comment. CI's ci.yml runs this, so it'll block merge despite the hatch run prepare checkbox. One ruff format run clears it.
Timeout semantics (suggestion): timeout is checked before validation, so a final response that would pass is reported as timeout/not-passed. Inline note suggests validating first or documenting the choice.

Only the formatter is blocking; everything else is non-blocking. Nice work — the docs page and judge/validator separation read really well.

…mple Explicitly type the validator return as Promise<ValidationOutcome> and cast structuredOutput to satisfy the Validator type constraint.

github-actions · 2026-06-15T15:54:27Z

Re-review — head `3d6e13a`

Assessment: Comment (one merge-gate fix still outstanding)

The new commit since my last pass is docs-only (the TS custom_judge typecheck fix) and it checks out — Promise<ValidationOutcome> + the as ValidationOutcome cast with a ?? {passed:false,...} fallback correctly mirror the Python judge path.

Still blocking: my earlier inline comment about ruff format --check failing on plugin.py is unaddressed — the docs commit didn't touch it. Re-confirmed on this head:

$ ruff format --check src/strands/vended_plugins/goal/plugin.py
Would reformat: src/strands/vended_plugins/goal/plugin.py

It's the same three spots (blank line before Validator L79 and GoalStopReason L97; the __call__ signature collapses to one line). ci.yml runs this check, so it'll block merge. One ruff format run clears it.

Everything else is green on this head — 41/41 unit tests, ruff check, and mypy all pass. The non-blocking timeout-semantics suggestion is already covered by the docs' Limitations section, so no action needed there.

github-actions Bot added the size/xl label Jun 11, 2026

notowen333 requested a deployment to manual-approval June 11, 2026 18:11 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 18:11 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:35 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 19:35 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:35 — with GitHub Actions Waiting

github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026

notowen333 requested a deployment to manual-approval June 11, 2026 19:38 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 19:38 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:38 — with GitHub Actions Waiting

github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026

notowen333 requested a deployment to manual-approval June 11, 2026 19:43 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 19:43 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:43 — with GitHub Actions Waiting

github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026

notowen333 requested a deployment to manual-approval June 11, 2026 19:45 — with GitHub Actions Waiting

notowen333 had a problem deploying to manual-approval June 11, 2026 19:45 — with GitHub Actions Error

notowen333 requested a deployment to manual-approval June 11, 2026 19:45 — with GitHub Actions Waiting

github-actions Bot removed the size/xl label Jun 11, 2026

notowen333 added 15 commits June 12, 2026 16:56

fix: use mermaid for diagrams, update skills to enforce it

f3652e2

Replace ASCII box-drawing diagram with a mermaid flowchart. Add mermaid requirement to docs-writer and docs-reviewer skills.

fix: remove unnecessary callout boxes from goal-loop page

c083530

Both were plain facts that belong as inline prose, not visually loud admonitions. The caution described behavior the plugin already warns about; the note was just context.

fix: consolidate callout guidance to mdx-authoring.md reference

a470236

The callout sparing-use rule already lives in mdx-authoring.md. Remove redundant restatements from writer/reviewer skills and instead point to the existing guidance at the right moments.

fix: use proper Agent type instead of Any in weakref collections

caf7c9e

fix: align Python GoalLoop validation behavior with TypeScript

5c0d918

- Reject timeout <= 0 (was allowing 0 which causes immediate timeout) - Use unicode ellipsis in truncation to match TS output format

refactor: rename 'raw' variable to 'result' in validator normalization

08e7293

fix(site): use valid schema tag in goal-loop frontmatter

0dc0bf7

Replace 'plugins' (not in the collection schema) with 'event-loop'.

zastrowm reviewed Jun 12, 2026

View reviewed changes

notowen333 added 2 commits June 12, 2026 18:07

fix(docs): use more illustrative custom resume prompt example

cf3935e

The "start over from scratch" prompt shows a real reason to customize — diverging from the default incremental-fix behavior — rather than restating the default in slightly different words.

github-actions Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py

github-actions Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py

zastrowm previously approved these changes Jun 13, 2026

View reviewed changes

fix(docs): fix TypeScript snippet typecheck error in custom judge exa…

3d6e13a

…mple Explicitly type the validator return as Promise<ValidationOutcome> and cast structuredOutput to satisfy the Validator type constraint.

zastrowm approved these changes Jun 15, 2026

View reviewed changes

This was referenced Jun 15, 2026

chore: add CLAUDE.md files #2809

Merged

feat: add default for memory extraction trigger #2811

Merged

fix: mark internal memory functions as private #2817

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(strands-py): add GoalLoop vended plugin with docs#2738

feat(strands-py): add GoalLoop vended plugin with docs#2738
notowen333 merged 18 commits into
strands-agents:mainfrom
notowen333:python-goal-plugin-with-docs

notowen333 commented Jun 11, 2026 •

edited

Loading

Uh oh!

agent-of-mkmeral commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

notowen333 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Public API Changes

Exported symbols

GoalLoop constructor parameters

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

agent-of-mkmeral commented Jun 12, 2026

Re-check at head 7d855d0 (covers 930c808 + 7d855d0)

✅ The one real fidelity gap is fixed (Ralph-mode system_prompt rewind)

✅ Minor items from my review, all addressed

Port fidelity of the new changes themselves

Remaining (non-blocking)

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation Preview Ready

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Re-review — head 3d6e13a

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

notowen333 commented Jun 11, 2026 •

edited

Loading

Re-check at head `7d855d0` (covers `930c808` + `7d855d0`)

✅ The one real fidelity gap is fixed (Ralph-mode `system_prompt` rewind)

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Re-review — head `3d6e13a`